Notice Formatter Service
The Notice Formatter Service is a sophisticated pipeline that transforms raw notice content from the SuperSet portal into standardized, human-readable messages optimized for multiple notification channels, with a focus on Telegram. It leverages LLM-based classification, fuzzy matching, and structured extraction to deliver consistent, audience-appropriate formatting across different notice types including job postings, webinars, hackathons, shortlistings, and general announcements.
The service integrates tightly with the broader notification ecosystem, supporting both automated scheduling and manual triggering via CLI commands. It ensures content safety through HTML cleaning, link handling, and Markdown/HTML rendering compatibility, while respecting Telegram’s character limits and formatting capabilities.
The Notice Formatter Service resides within the application’s services layer and interacts with data clients, database services, and notification channels through a well-defined dependency injection architecture.
Diagram sources
Section sources
The Notice Formatter Service is built around a LangGraph-based state machine that processes notices through distinct stages: text extraction, classification, job matching, enrichment, structured extraction, and final formatting. It maintains a compact set of helper utilities for date formatting, currency display, HTML breakdown parsing, and content prettification.
Key capabilities include:
Multi-stage LLM classification into categories (job posting, shortlisting, webinar, hackathon, announcement, update)
Fuzzy company name matching against structured job listings
Structured information extraction tailored to each notice category
Category-specific formatting templates with consistent Markdown/HTML output
Integration hooks for job enrichment callbacks
Content cleaning and sanitization for Telegram compatibility
Section sources
The formatter operates as a stateful pipeline that transforms unstructured notice content into standardized messages. The architecture emphasizes modularity, allowing for easy extension to new notice types and integration with additional channels.
Diagram sources
NoticeFormatterService Class#
The core formatter implements a LangGraph StateGraph with six nodes representing the processing pipeline. Each node encapsulates a specific transformation step with explicit input/output contracts.
Diagram sources
Section sources
Content Transformation Pipeline#
The pipeline transforms raw HTML content through multiple stages, each with specific responsibilities:
Text Extraction: Converts HTML to plain text while preserving structure
Classification: LLM-driven categorization using strict labeling rules
Job Matching: Entity extraction and fuzzy matching against job listings
Enrichment: Optional detailed job retrieval for matched entries
Structured Extraction: Category-specific JSON parsing with validation
Formatting: Template-based message composition with Telegram compatibility
Diagram sources
Section sources
Category-Specific Formatting Templates#
The formatter implements distinct templates for each notice type, ensuring consistent presentation across channels:
Job Posting Template#
Header: “📢 Job Posting” with company and role
Key Sections: Location, CTC with package breakdown, eligibility criteria, hiring flow
Deadlines: Prominent warning with IST formatting
Links: Direct job details URL when available
Shortlisting Template#
Header: “🎉 Shortlisting Update”
Lists: Total shortlisted count and student names with enrollment numbers
Context: Company, role, location, package information
Process: Hiring flow steps when available
Webinar Template#
Header: “🎓 Webinar Details”
Timing: Flexible date/time formatting with IST timezone
Venue: Platform or physical location
Registration: Direct link handling with deadline warnings
Hackathon Template#
Header: “🏁 Hackathon”
Duration: Start/end date formatting
Structure: Theme, team size, prize pool, venue
Registration: Deadline and link handling
Announcement Template#
Header: Bold title with passthrough content
Attribution: Author and timestamp footer
Minimal Processing: Light cleanup for readability
Section sources
Integration with Telegram Message Formatting#
The formatter produces content compatible with Telegram’s HTML parsing, with automatic fallback to MarkdownV2 escaping when needed. The Telegram service handles:
Character limit enforcement (4000 character chunks)
Automatic message splitting with line-aware boundaries
Markdown/HTML conversion with special handling for links and headers
Retry mechanisms with plain text fallback
Diagram sources
Section sources
Content Cleaning and Safety Features#
The formatter implements comprehensive content cleaning to ensure safe, readable output:
HTML Stripping: BeautifulSoup-based extraction with table parsing and paragraph handling
Line Normalization: Collapse excessive blank lines and trim trailing whitespace
Special Character Encoding: Proper handling of non-breaking spaces and Unicode characters
Link Preservation: Maintains hyperlinks while ensuring proper HTML anchor tags
Markdown Compatibility: Automatic escaping for Telegram MarkdownV2 special characters
Section sources
Integration with Notification Delivery System#
The formatted notices integrate seamlessly with the broader notification infrastructure:
Diagram sources
Section sources
The Notice Formatter Service maintains loose coupling with external dependencies through well-defined interfaces and typed data models.
Diagram sources
Section sources
The formatter is optimized for production use with several performance-conscious design decisions:
Selective Enrichment: Jobs are only enriched when matched during notice processing, avoiding unnecessary API calls
Fuzzy Matching Thresholds: Configurable similarity thresholds (80+) prevent false positives while maintaining accuracy
State Machine Efficiency: LangGraph minimizes memory overhead through state reuse
Batch Processing: Support for formatting multiple notices in sequence
Caching Opportunities: Job lookup dictionaries prevent repeated database queries
Common issues and their resolutions:
LLM Parsing Failures#
Symptoms: Empty extracted fields or JSON parsing errors Causes:
LLM output format inconsistencies
Complex HTML structures causing extraction ambiguity
Insufficient context for classification
Resolutions:
Verify LLM prompt templates are properly formatted
Check HTML content complexity and consider preprocessing
Monitor classification confidence scores
Telegram Message Limits#
Symptoms: Truncated messages or delivery failures Causes:
Messages exceeding 4000 characters
Improper HTML/Markdown formatting
Resolutions:
Enable automatic message splitting in TelegramService
Validate message length before sending
Test with simplified content first
Job Matching Issues#
Symptoms: Missed job matches or incorrect fuzzy matches Causes:
Company name variations in notices vs. job listings
Low similarity threshold settings
Missing job enrichment callback
Resolutions:
Adjust fuzzy matching thresholds (currently >80)
Implement job enrichment callback for matched entries
Verify company name normalization
Section sources
The Notice Formatter Service provides a robust, extensible foundation for standardizing notice content across multiple channels. Its LLM-powered classification, structured extraction, and category-specific formatting ensure consistent, professional presentations while maintaining flexibility for future enhancements. The integration with Telegram’s formatting requirements and the broader notification ecosystem makes it a critical component of the system’s communication infrastructure.
The service’s modular design, comprehensive error handling, and performance optimizations position it well for scaling to additional notice types and notification channels as requirements evolve.